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Abstract 

Methods that integrate population-level sampling from multiple taxa into a single community-level analysis are an 
essential addition to the comparative phylogeographic toolkit. Detecting how species within communities have demo* 
graphically tracked each other in space and time is important for understanding the effects of future climate and 
landscape changes and the resulting acceleration of extinctions, biological invasions, and potential surges in adaptive 
evolution. Here, we present a statistical framework for such an analysis based on hierarchical approximate Bayesian 
computation (hABC) with the goal of detecting concerted demographic histories across an ecological assemblage. Our 
method combines population genetic data sets from multiple taxa into a single analysis to estimate: 1) the proportion of a 
community sample that demographically expanded in a temporally clustered pulse and 2) when the pulse occurred. To 
validate the accuracy and utility of this new approach, we use simulation cross-validation experiments and subsequently 
analyze an empirical data set of 32 avian populations from Australia that are hypothesized to have expanded from 
smaller refugia populations in the late Pleistocene. The method can accommodate data set heterogeneity such as 
variability in effective population size, mutation rates, and sample sizes across species and exploits the statistical strength 
from the simultaneous analysis of multiple species. This hABC framework used in a multitaxa demographic context can 
increase our understanding of the impact of historical climate change by determining what proportion of the community 
responded in concert or independently and can be used with a wide variety of comparative phylogeographic data sets as 
biota-wide DNA barcoding data sets accumulate. 

Key words: comparative phylogeography, approximate Bayesian computation, historical demography, response to 
climate change. 



Introduction 

Fluctuations in climate during the Quaternary resulted in 
widespread expansions and contractions of ice sheets, re- 
gional shifts in temperature and precipitation, and changes 
in sea levels and sea surface temperatures (Yu and Eicher 
1998; Cutler et al. 2003; Pahnke et al. 2003). These fluctuations 
brought about profound changes in population sizes, species 
range distributions, and geographic patterns of genetic diver- 
sity in nearly all species (Graham et al. 1996; Comes and 
Kadereit 1998; Dynesius and Jansson 2000; Hewitt 2000, 
2004; Davis and Shaw 2001; Hill et al. 2011). Characterizing 
the complex dynamics and aggregate demographic histories 
associated with expansions and contractions of large species 
assemblages is a daunting yet vital component of learning 
how climatic change can drive patterns of meta-comm unity 
assembly in various regions (HilleRisLambers et al. 2012). 

Despite the large number of phylogeographic studies ex- 
amining species response to historical changes that have 



emerged since the field's inception in the 1980s, there remains 
a pressing need for the integration of multispecies data sets 
within a cohesive statistical framework (Knowles 2009). 
Methods that use genetic data from whole assemblages can 
then be used to determine the community-level patterns of 
both individualistic and aggregate responses to climate cycles. 
Although community-level inference (Taberlet et al. 2002; 
Soltis et al. 2006; Drew and Barber 2012; Wood et al. 2012) 
from aggregate parameter estimates from each species and/or 
qualitative comparisons of phylogenetic trees or phylogeo- 
graphic networks from each species can provide useful infer- 
ences about community responses to changes in climate and 
landscape (Sullivan et al. 2000; Soltis et al. 2006), a unified 
statistical framework that can pool information across species 
would be broadly beneficial. 

In this study, we present a new and generally applicable 
multispecies population genetic method that is suitable for 
aggregate demographic histories of any community provided 
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the availability of population-level sampling of single locus 
data across codistributed species. This general method can 
potentially make use of the plethora of data from nearly 25 
years of phylogeographic studies (Hickerson et al. 2010) and 
mitochondrial DNA (mtDNA) barcode initiatives (17,505 
chordate species; Ratnasingham and Hebert 2007, as of May 
2014). 

Under this strategy, this multitaxon statistical framework 
makes community-level inference while accommodating co- 
alescent, mutational, and demographic variance associated 
with individual species and loci (Beaumont 2010; Morgan 
et al. 2011) and gains statistical strength from pooling data 
for a single analysis (Beaumont and Rannala 2004). 
Specifically, our multitaxon statistical framework uses hierar- 
chical approximate Bayesian computation (hABC). Under 
complex models that make calculating the likelihood 
function difficult, ABC uses a comparison of observed to sim- 
ulated summary statistics to sidestep the calculation of the 
likelihood function and develop approximate likelihoods for 
model choice and parameters (Sunnaker et al. 2013). hABC is 
an ABC extension of any hierarchical Bayesian model where 
multiple parameters are structured into multiple dependent 
levels whereby observable outcomes are modeled conditional 
on certain parameters, which themselves are given probabil- 
istic specification in terms of higher level parameters, known 
as hyperparameters (Gelman et al. 2004). Hierarchical models 
are particularly useful when there are a number of quantities 
such as loci or populations, and it is unknown whether they 
should be parameterized the same way or independently 
(Beaumont 2010). 

However, up to now, most hABC studies in a comparative 
phylogeographic context focused on studies of codivergence 
(Hickerson and Meyer 2008; Carnaval et al. 2009; Barber and 
Klicka 2010; Morgan et al. 2011) and local adaptation (Bazin 
et al. 2010). The method we introduce here is the first 
multispecies coalescent model-based method for the demo- 
graphic inference of coordinated and/or independent 
multipopulation expansion histories, thereby allowing large- 
scale inferences relevant to questions about community 
assembly and variable responses to future climate changes. 

The hABC Approach 

Approximate Bayesian computation is particularly well suited 
to address the complexity of communities and natural pop- 
ulations where evaluation of the likelihood function is difficult 
(Csillery et al. 2010). Often based on the rejection algorithm 
(Tavare et al. 1997; Pritchard et al. 1999), it involves the 
simulation of large numbers of data sets under different 
hypothesized scenarios with parameter values drawn from a 
prior. Simulated and observed data sets are reduced to sum- 
mary statistics and then posterior probability distributions are 
approximated from the comparisons between the simulated 
and observed summary statistics. Although it has been used 
frequently in population and evolutionary genetics to 
estimate species-specific parameters such as effective popula- 
tion size and past demographic events such as growth, de- 
cline, and migration under complex demographic histories 
(Beaumont 2010), its application to community-wide 



dynamics is relatively recent (Hickerson et al. 2006) where it 
has been used to test for simultaneous divergence among 
sister taxa. Here, we infer community-wide dynamics based 
on a demographic history of concerted response. Our goal 
was to provide a tool to determine what proportion of a 
community responded simultaneously to a hypothesized his- 
torical event, such as a sea level change or expansion out of 
refugia, and to estimate the timing of the response. 

We present an hABC scheme for detecting the presence 
and timing of multispecies coexpansion pulses at the time 
scale of the late Quaternary (table 1). We harness this ap- 
proach to infer the proportion of a contemporary community 
that coexpanded simultaneously, as well as when that expan- 
sion occurred. Rather than compiling and qualitatively com- 
paring a set of single-species inferences, this hABC method 
allows estimation of hyperparameters that quantify multispe- 
cies patterns, such as synchronicity in expansion time. By 
combining the data in a single hierarchical Bayesian analysis, 
we incorporate uncertainty in the amount of dependency 
among taxon-specific parameters, thereby allowing for both 
historical congruence and demographic independence across 
taxa (Beaumont 2010). 

First, we apply the hABC framework in extensive simula- 
tion experiments to validate the method and to examine the 
robustness of our model estimates and parameter estimation 
to different sizes of community samples and maximum ex- 
pansion times. We introduce a model index hyperparameter 
of community congruence ^, which ranges from complete 
simultaneous expansion (^ = 1.0), where all n species ex- 
panded at the same time to complete random expansion 
(^ = 0.0), where each of the n species expanded independently 
at times t,. For intermediate models, ^ is proportional to the 
number of species that coexpand synchronously with (1— Q 
being proportional to the number of species that expand 
individualistically at random times r,. All populations have 
a demographic model that consists of a contemporary effec- 
tive size of Nj that is independently drawn from a uniform 
prior (-/(Nmin, Nmax), with every population instantaneously 
expanding from populations that are a fraction of their cur- 
rent size £j chosen from a uniform prior of expansion mag- 
nitudes U(emin, fimax) at times T, or that have a uniform 
prior distribution of Uir^m, t^ax) (fig- 1)- Although each of 
the n taxa's effective population size N, and expansion mag- 
nitude e, are free to vary, the method allows estimating the 
following: 1) ^, the proportion of the n taxa that synchro- 
nously expand; 2) £(t), the mean expansion time of n taxa; 3) 
Ts, the coexpansion time of the synchronous taxa; and 4) the 
dispersion index of all n expansion times, Var(T)/£(T). 

To demonstrate the method's bias and accuracy given that 
the model is correct, we performed a series of cross-validation 
tests by simulating data sets having fixed values for the com- 
munity congruence hyperparameter index ^ (pseudo-ob- 
served data sets [PODs]; Bertorelle et al. 2010) and 
subsequently estimating ^, E(t), r^, and Var{r)/E{r) using 
the ABC accept-reject method. To investigate the method's 
performance when the model is incorrect (i.e., model gener- 
ating the data differs from the model used for inference) and 
if such incorrect models could possibly be detected prior to 
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Table 1. Hierarchical Approximate Bayesian Computation Procedure Outlined for Comparative Phylogeographic Inference of Concerted 
Demographic Response. 



Objectives 


• Estimate the proportion of n taxa that coexpanded synchronously (Q 




• Estimate mean time of expansion £(t), coexpansion time r„ and time-dispersion Var(T)/£(T) 


Draw hyperparameter value from 


1) For each data set of n taxa, draw a value of C, representing ||nQ| taxa that are coexpanding synchro- 


prior 


nously across n taxa 


Draw taxon-speclfic parameter 


2) Draw an expansion time z„ and for each nQ| taxa expanding individuaiistically and a single 


values from prior 


from the prior for all \\nQ\ taxa that are coexpanding synchronously. Assign each of the n taxa an ef- 




fective population size (N), expansion magnitude (e), and mutation rate (u) drawn from taxon-specific 




priors 


Simulate data for each taxon and 


3) Simulate data based on sample sizes, sequence lengths, and taxon-specific parameter values for all the 


calculate summary statistics 


n taxa using a coalescent simulation program 




4) Record parameter values and calculate summary statistics (number of haplotypes, haplotypic diversity, 




nucleotide diversity, and Tajima's D) for each of the n taxa 


Calculate hypersummary statistics 


5) Calculate summary statistics for each set of n taxa based on the mean, variance, skewness, and kurto- 




sis to compress multitaxon data set into Dj 




6) Repeat steps 1-5 for X iterations 




7) Compute hypersummary statistics from the observed multitaxon data set D* 


Accept/Reject 


8) Accept Dj for 1,000 smallest Euclidean distances between simulated and observedlD^— D*l and record 




hyperparameter values 5, £(t), Tj, and Var(T)/E(T). Reject the remaining 


Estimation 


9) Fit local linear regression model to 1,000 accepted data sets and adjust hyperparameter values to 




obtain joint posterior probability estimates of £(t), Var(T)/£(T),Ts, and 



applying our method, we simulated PODs from communities examined the behavior of three sets of PODs with key differ- 
that differed from our coexpansion and random expansion ences from the model used to generate prior samples con- 
community by including nonexpanding populations, declin- tained in the reference table. These three sets of PODs were 
ing populations, and multiple congruent pulses, and per- generated under the following three models: 1) "constant" — 
formed graphical checks using principal component analysis 15 taxa expansion at the same time and 35 taxa with zero 
(PCA) given the PODs and data sampled from the hyperprior. population growth; 2) "declining" — 15 taxa expansion at the 
Finally, we apply this method to an empirical data set of 32 same time and 35 taxa declining to one-tenth their original 
avian populations from Australia to determine how many size at random times; and 3) "two pulse" — 15 taxa expansion 
bird species coexpanded simultaneously and the timing of at 30,000 generations before present and 35 taxa expansion at 
any coexpansion. We validate our estimates with PODs 
using the sampling and species-specific prior distributions 
of these 32 avian populations and conduct graphical model 
checking of the prior and posterior predictive distributions. 
Thus, we verify that the model can generate the main features 
of the observed data. We then use PCA to help identify which 
of the 32 species coexpanded, followed by an hABC analysis 
on this subset to confirm species identification and validate 
the temporal coexpansion, and finally, to estimate when the 
coexpansion occurred. 

Results 

Cross-Validation of hABC Method 
Sensitivity analyses to the size of the community (10 vs. 50 
species) and the time frame (maximum expansion time 
Tmax= 100,000 or 500,000 years before present) show that, 
as expected, the estimates of mean time £(t) and dispersion 
index of time (Var(T)/£(T)) are substantially more accurate 
when a greater proportion of the taxa are coexpanding (fig. 2 
and table 2). In addition, although the number of taxa sam- 
pled from a community (10 vs. 50) does have a strong effect 
on being able to test for synchronous or asynchronous coex- 
pansion, the timeframe is much less so. 

Model assumptions are an important component of all 
model-based methods (Csillery et al. 2010), therefore we 
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90,000 generations before present. Using PCA, we compared 
data generated from each of these models to data generated 
from a model with 15 species coexpanding at a single time 
and 35 species expanding at independent times (i.e., ^ = 0.3) 
and expansion magnitudes e, drawn from the uniform prior 

^(^mitr ^max)- 

Using PCA as a graphical check to compare the PODs and 
data generated from the prior, we found that data generated 
from the constant and declining models can be detected with 
PCA to be a poor fit for hABC analysis under our coexpansion 
model (fig. 3A and B). In contrast, this PCA technique sug- 
gested that data generated from the two-pulse model were 
not found to be distinguishable from data generated from our 
single pulse ^ - 0.3 model (fig. 3C). 

Pleistocene Expansion Times in Australian 
Avian Populations 

To demonstrate the method on an empirical data set, we 
applied this hABC method to data from 32 Australian bird 
populations. For this analysis, we specifically chose species in 
which all samples represented a monophyletic cluster lacking 
in genetic structure and putatively expanded from a single 
refugium. After using PCA to conduct graphical model checks 
of the prior and posterior predictive distributions to verify 
that the model can generate the main features of 
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Fig. 1. Depiction of models of synchronous coexpansion (A; ^ = 1.0), 
partially synchronous coexpansion (B; 1^ = 0.4), and asynchronous expan- 
sion (C; ^ = 0.0) all involving five populations. Each population has a 
contemporary effective population size of Nj from prior U(Nmin/ N^^) 
and expands instantaneously from a population £, the current size 
chosen from expansion magnitudes L/(£min/ £max) at times r, or Tj 
from the uniform prior of U{r^i„, tmax)- 



the observed data (fig. 4), we estimated ^ to be 0.8 
(95% quantiles = 0.3-1.0; fig. 5A), thereby corresponding to 
an estimate of 26 of 32 populations temporally coexpanding 
(95% quantiles = 10-32). The mode for the mean expansion 
time £(t) was 54,321 (95% quantiles = 38,918-71,287) and of 
the time of the coexpansion Tj was 35,225 (95% 
quantiles = 18,963-67,545; fig. 5C and D). 



The hABC estimates of ^, given five sets of 100 PODs each 
simulated from fixed values of ^ (0.0, 0.25, 0.5, 0.75, and 1.0), 
demonstrate that the method can detect the degree to which 
a community coexpanded at a single time, as well as estimate 
the expansion time and dispersion index of expansion times 
(fig. 6). For example, we simulated 100 PODs under the fixed 
value of ^ = 0.25, thereby yielding 8 taxa expanding synchro- 
nously at time and 24 taxa expanding at random times t,, 
with the expansion times drawn from priors Pr(T5) or Pr(T,) of 
L/(1,000, 200,000) generations ago. Each estimate of ^ for each 
POD is based on 1,000 acceptances from a reference data 
table of 6.6 million simulations with ^ drawn from the discrete 
uniform hyperprior of Pr(Q = [0.0, 1/32, . . . , 31/32, 1.0]. In this 
case, the reference table consists of 200,000 data sets simu- 
lated from each of the 33 possible values of the hyperprior ^ 
ranging from 0 to a total of 32 populations expanding con- 
gruently with parameters for each population drawn from 
species-specific priors. The cross-validation PODs and error 
estimates demonstrate that the hABC method can poten- 
tially obtain accurate estimates of ^, £(t), and Var(T)/E(T). 

As a heuristic approach to identify which 10-32 of the 32 
avian populations coexpanded, we subsequently conducted 
an exploratory PCA. Specifically, we chose a cluster of 26 
populations plotted on the first two components to identify 
a putative subset of populations that synchronously coex- 
panded (fig. 7C). We then assessed whether this subset of 
populations have data that are consistent with a history of 
temporal coexpansion by running an hABC analysis with a 
new reference table and observed summary statistic vector D 
regenerated from the subset of 26 populations corresponding 
to the PCA cluster. The estimates of ^, Var(T)/£(r), and on 
this subset yield strong support for complete synchronous 
coexpansion (mode estimate of ^ = 0.95, fig. 7) and agree with 
the estimate of coexpansion time derived from the previous 
analysis (mode estimate of 32 populations 35,225 vs. 
mode estimate of 26 populations - 33,465, 95% 
quantiles = 26,059-50,961; fig. 7). 

Discussion 

The prevalence of shallow mtDNA genealogies found in many 
terrestrial and marine communities (Grant and Bowen 1998; 
Hewitt 2000) suggests that there may be widespread tempo- 
rally and spatially shared responses of communities to 
Pleistocene environmental fluctuations such as sea level 
changes or glacial retreats. A statistical framework to 
test this hypothesis at the multispecies level will enable 
community-level inference for understudied groups and 
across whole biota. In contrast to previous applications of 
hABC to comparative phylogeographic data sets which 
sought to quantify support for models of vicariance or dis- 
persal (Hickerson and Meyer 2008) or test for synchronous 
isolation (Huang et al. 2011), the method we present here 
enables quantifying the amount of temporal congruence in 
demographic coexpansion across species. Discerning whether 
species responded in concert or individualistically to historical 
changes in climate and landscape is important with respect to 
understanding the biogeographic dynamics of future climate 
changes, invasions, and extinction (Lavergne et al. 2010). 
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Fig. 2. Simulation validation of the ABC estimator of the mean and dispersion index of the expansion times across 10 (panels A, B, £, and F) or 50 
(panels C, D, G, and H) populations (£(t) and Var(r)/£(T), respectively). Each joint estimate was made on a POD with parameter values randomly drawn 
from the priors Pr(r) = L/(1,000, 100,000) generations. Sets of PODs were drawn from histories with all populations expanding synchronously = 1.0; 
panels A-D) and asynchronously (^ = 0.0; panels £-H). True value for Var(T)/£(T) = 0 when ^ = 1.0. 



Table 2. Comparison of Different Numbers of Species and Expansion 
Time Priors Indicates that Using Greater Numbers of Species Lowers 
the Error in Detecting Simultaneous Expansion. 

True Number Tmax Error Rate Error Rate Error Rate 
Model (Q of Species (majority) (Bayes bctor (Bayes factor 



threshold > 3) threshold > 10) 



1 


10 


100,000 


0.08 


0.29 


0.51 


0 


10 


100,000 


0.30 


0.62 


0.87 


1 


10 


500,000 


0.07 


0.30 


0.67 


0 


10 


500,000 


0.28 


0.60 


0.80 


1 


50 


100,000 


0.00 


0.01 


0.01 


0 


50 


100,000 


0.03 


0.14 


0.29 


1 


50 


500,000 


0.02 


0.04 


0.06 


0 


50 


500,000 


0.00 


0.00 


0.00 



Note. — Error rates are calculated using either the majority of posterior samples or 
the Bayes Factor that compares the posterior weights of t^ = 0.0 and l^ = ^.0. Errors 
were determined from estimates of PODs generated under ^ = 0.0 and = 1.0 for 
communities of 10 and 50 species and z^nax of 100,000 and 500,000 generations 



Although we focus on comparative demographic inference, 
interpretations from this method could focus on demo- 
graphic histories of matrilineages or selective sweeps accom- 
panied by hitchhiking (Gillespie 2001; Bazin et al. 2006). 

This general method and downstream extensions will be 
able to be deployed on the wealth of mtDNA, single-copy 
nuclear DNA, and chloroplast spacer regions that have 
become available for animals, plants, protists, and fungi 
over the last quarter of a century for phylogeographic 
(Soltis et al. 2006) and mtDNA studies (Ratnasingham and 
Hebert 2007), as well as newly available ancient DNA data 
(Ramakrishnan and Hadly 2009; Ho and Gilbert 2010; 
Lorenzen et al. 2011). Even if only a single genealogical 
sample is available from each animal species, the opportunity 



to sample this locus within communities across wide taxo- 
nomic breadth enables researchers to make regional-scale 
inferences of how whole assemblages responded to the cycli- 
cal climate changes of the Pleistocene, all while accounting for 
the variability associated with using a single locus per species. 

The demographic model involves synchronous and asyn- 
chronous expansion where each taxon consists of a single 
panmictic population that is expanding. Although hetero- 
geneity in sample size, sequence length, mutation model, 
and generation time, are incorporated into the simulations 
and taxon-specific parameters such as mutation rate and 
effective population sizes are allowed to freely and indepen- 
dently vary across taxa, we suggest that when applying this 
method, taxa/populations should be chosen that are likely 
to be panmictic. If taxa are found to have population struc- 
ture, the data set can be divided into populations and each 
subset included separately in the analysis, although one 
should be cautious in cases where there are an unknown 
number of unsampled ghost populations that could cause 
distortions in inference (Heller et al. 2013). As with other 
model-based methods, it is important to perform model- 
checking procedures, such as posterior predictive checks 
and cross-validation simulation experiments, to determine 
whether parameters can be accurately estimated given the 
empirical data (Gelman et al. 2004; Csillery et al. 2010). 

To examine the robustness of our model estimation pro- 
cedure, we quantified the impact of the prior on posterior 
distributions, parameter inference, and model choice 
(Gelman et al. 2004). We found that the timescale of the 
prior had little influence on the model choice or parameter 
estimation but that error rates for model choice were lower 
with 50 species compared with 10 species (table 2). To deter- 
mine goodness-of-fit and examine whether misspecified 
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Fig. 3. Projection of the summary statistics calculated from 100 PODs from constant, declining, and two-pulse models. (A) Constant: 15 species 
expanding congruently at 1,000-100,000 generations before present and 35 species not expanding ( ♦ ). (B) Declining: 15 species expanding congruently 
at 1,000-100,000 generations before present and 35 species declining at 1,000-100,000 generations before present (♦). (C) Two-pulse model that 
included 35 species expanding congruently at 30,000 generations before present and 15 species expanding congruently at 90,000 generations before 
present (♦); and 100 samples from the hyperprior (o) of the expansion model where 15 species are expanding congruently at 1,000-100,000 
generations before present and 35 are expanding at random times 1,000-100,000 generations before present onto the first two axes of a PCA. 



models (i.e., when the actual model generating the data does 
not match the model used for inference) could be detected, we 
performed graphical model checking on PODs from constant, 
declining, and two-pulse models. We found that a PCA com- 
paring summary statistics calculated from PODs generated 
from these alternative models and summary statistics calcu- 
lated from data generated from random draws from the prior 
model used for inference (i.e., our expansion model) was effec- 
tive for identifying cases of poor model fit for the constant and 
declining population models (fig. 3A and B) but not the two- 
pulse model (fig. 3C). However, in cases where the data are 
generated under a two-pulse model, one can first use our 
method to detect asynchronous coexpansion and subse- 
quently explore running the framework on subsets of the 
data to identify multiple pulses of coexpansion (fig. 7). 

To minimize bias introduced by violations of the assump- 
tion of panmictic populations (Navascues and Emerson 2009; 
Heller et al. 2013), we demonstrate the method on an avian 
assemblage from Australia, where species typically have high 



dispersal and consequently simplified patterns of population 
structure. In Australia, many common widespread species 
across the continent may have had assemblage-wide bottle- 
necks in response to widespread climatic events in the late 
Pleistocene (Barnosky et al. 2004), and our method suggests 
that a subset of populations potentially coexpanded 26,000- 
51,000 generations ago, coinciding with Marine Isotope Stage 3, 
a highly variable climatic period (Siddall and Rohling 2008). 

Although hABC enabled the estimation of the number of 
species in the Australian data set that share congruent de- 
mographic dynamics, the identification of individual species 
by PCA shown here is heuristic, and there are likely to be 
other subsets that may also yield high posterior probability of 
synchronous coexpansion. Beyond such exploration with 
PCA, one could examine parameter estimates from individual 
species analysis using Bayesian skyline plots or mismatch dis- 
tribution methods (Rogers and Harpending 1992; Kuhner 
et al. 1998; Schneider and Excoffier 1999; Ray et al. 2003; 
Excoffier 2004; Ho and Shapiro 2011). Whatever method is 
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Fig. 4. Projection of the summary statistics calculated from 1,000 samples from the prior (A), posterior (B), and posterior predictive distributions (C and 
D) onto the first two axes of a PCA. The open circle is calculated from the observed Australian data collected from 32 avian populations. Panels (A-C) 
use the 16 original summary statistics used for the parameter estimates (mean, variance, skewness, and kurtosis of the number of haplotypes, haplotypic 
diversity, nucleotide diversity, and Tajima's D) and panel (D) uses an alternative set of 8 summary statistics (mean, variance, skewness, and kurtosis of the 
number of segregating sites and Fu's F). In panels (C) and (D), the 1,000 sets of accepted parameter values conditioned on the observed data to 
approximate the posterior were subsequently used to resimulate the data 1,000 times to obtain the posterior predictive distributions (Celman et al. 
2004; Cornuet et al. 2010). 



used to identify the individual species, the subset of congru- 
ent species can then be verified with an hABC analysis on the 
putative subset and with resulting parameter estimates of 
^ « 1.0 and Var(T)/£(T) % 0 providing confirmation that 
the subset of species expanded synchronously, although com- 
peting models with alternate subsets may be unidentifiable 
given single locus mtDNA data. 

We model an assemblage history that involves one 
synchronous expansion plus a group of asynchronous expan- 
sions, although future approaches could incorporate more 
than one synchronous expansion pulse and/or pulses or 
contraction pulses. The use of a more realistic model of 
population growth such as an exponential model may also 
improve inference, although our instantaneous model is likely 
to capture the effects of an exponential model (Rogers and 
Harpending 1992). Greater model complexity may also be 
accommodated by the incorporation of high throughput 
population genomic data obtained from next-generation se- 
quencing technology. 



Conclusion 

We present a new hABC method for demographic inference 
and testing community response to climate change. This 
framework enables the estimation of the proportion of the 
present-day community that expanded in a single pulse in the 
past and will further aid in fitting predictive species distribu- 
tional models of shared demographic and distributional 
changes (Lorenzen et al. 2011; Prunier et al. 2012). We test 
this method on an empirical data set of avian populations 
from Australia and demonstrate our ability to detect and 
estimate the timing of the expansion pulse. This method 
could help test macroecological hypotheses of community 
assembly (Weiher et al. 2011) and identify different responses 
by different ecological guilds or interacting species (Emerson 
and Gillespie 2008; Mikheyev et al. 2008; Stone et al. 2012). 
Overall, this approach can be extended to enable detecting 
forces underlying regional biodiversity patterns and provide a 
greater understanding of how changes in historical 
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Fig. 5. Posterior estimates of model index of community congruence hyperparameter ^ (panel A) and parameter summaries time dispersion index 
Var(r)/£(T) (panel B), total mean expansion time £(r), and coexpansion time Tj (panels C and D) conditional on data from 32 Australian avian 
populations. The posterior of ^ with a mode between 0.7 and 0.8 and Var(T)/£(T) »0.0 indicates a mixed history that includes populations that 
expanded asynchronously and a synchronous pulse of coexpansion in a subset of the populations. 



environmental features differentially affected species distribu- 
tions (Hickerson et al. 2010; Emerson et al. 2011). 

Materials and Methods 

The hABC Model 

Similar to the hierarchical Bayesian formulation of lives et al. 
(2010), which was used to estimate the proportion of taxon 
pairs that arose by way of colonization rather than "soft vicar- 
iance," our objective here is to estimate the community con- 
gruence hyperparameter ^, the proportion of n taxa that 
coexpanded in a single synchronous cluster. A flow chart of 
our method is oudined in table 1. The hierarchical structure of 
the model is such that the data D is conditional on species- 
specific parameters cp and r, whereby the expansion time pa- 
rameters r, for 2, . . . , n species are conditional on ^, 
whereas the other species-specific parameters cpj (/x, , N, , and 
£,; mutation rates, current effective population sizes, and ex- 
pansion magnitudes, respectively) are assigned to ;' = 1, 2, . . . , n 



species independent of ^. Under this scheme, the posterior 
distributions of the parameters given the data are 
7r(C,r I D)(xP(D | I^tMt \ QttQ, and 7r((/) | D)cxP(D | 0) with 
the joint hierarchical posterior distribution of the parameters 
given the data being 7r(^ t,(/) | D)cxP(D | ^,t,0) = P(D | </),t) 
7r((/))7r(T I Q7r(Q. The community congruence hyperpara- 
meter ^ is then modeled, such that it can take k-n + ^ pos- 
sible values ^{0.0, 1/n, 2/n, n/n] that are assigned prior 
probabilities drawn from a discrete uniform distribution 7r(Q. 
In this case, each model within the vector {^c • ■ ■ - Ci} has 
equal prior probability, and the expansion time parameters t, 
are assigned to each of the n species conditional on ^, 7T{r \ Q. 

Under the ABC scheme, this hierarchical Bayesian mixture 
model is used to simulate the data D in three steps: 1) gen- 
erating the model from 7t{Q with each ^ value having equal 
probability P(Co.o)' ■ ■ ■ - PKi.o); 2) generating the parameter 
vectors (p^ and r^- from 7t{(p) and jT{r \ Qjt{Q; and 3) gener- 
ating the data D from P(D | (phTk). By conditioning on the 
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Fig. 6. Error estimates from simulation validation of the hABC estimator of model index of community congruence C, using local linear regression (panel 
A), simple rejection (panel B), coexpansion time (panel C), and Var(T)/£(r) time dispersion index of all expansion times (panel D). A set of 100 PODs 
was generated for five different levels of community congruence; true ^ = 1.0 where all species expanded in a synchronous pulse, true ^ = 0.0 where all 
species expanded randomly, and intermediate levels of synchronous and asynchronous expansion ^ = 0.25, 0.5, 0.75. Each set of 100 PODs was based on 
the sampling and priors of 32 Australian bird populations with expansion time randomly drawn from the prior Pr(T) = L/(1,000, 200,000). 



data D, this hierarchical ABC model then yields the approx- 
imate posterior probabilities of the focal hyperparameter^ and 
parameter summaries E{r),r^, and Var(T)/£(T), while allowing 
other taxon-specific parameters to be drawn independently 
for each taxon. For an analysis involving n taxa, these taxon- 
specific parameters cp include the n current effective popula- 
tion sizes {Ny . . ., N„} drawn from a uniform prior 
P(Nfc) = ^UCNmin. Nmax)' the H expansion magnitudes {e^ . . . , 
£„} drawn from a uniform prior P(efc) = ~U(emin, fimax). and the 
n mutation rates {jiy . . . ,iJ,„} that are in turn drawn from a 
uniform prior P(/xj() = ~U(/Limin, Mmax)/i with each locus-spe- 
cific mean chosen according to the data and estimates based 
on the literature (table 3, Nabholz et al. 2008). With regard to 
each taxon's expansion times t^, = {ti, . . . ,r | }, the 

II (1— Qn II , independently expanding taxa are each indepen- 
dently and randomly assigned an expansion time from the 
uniform prior P(Ta) = ~ L/(Tmin-Tmax)' whereas the ||n^|| 
taxa synchronously coexpanding are assigned a single that 



is randomly drawn from the same uniform ~U{r^;„, t^ax) 
(||.. II notation specifies the integer nearest the number 
within). 

Importantly, all n taxa experience an instantaneous expan- 
sion of magnitude at or ta-l^v . . . ,t ki-^),, | } gener- 
ations in the past to reach their respective effective size N^, 
such that each taxon's effective population size at time or 
times Ta-{Tv . . . ,T I 1(1-;;),, | | } IS (eN)(, < Nj, as depicted in 
figure 1 under the three different hypothetical scenarios of 
^ = 1.0, 0.4, and 0.0, respectively. As a practical way to estimate 
the hyperparameter model indicators for ^o.o- ■ ■ ■ -Ci.o from 
the data, we choose the hyperprior Pr(^j-) to be a simple 
discrete uniform prior P(^,j) = to favor all models equally 
and where the posterior is proportional to the likelihood 
7rK,T,0 I D)(xP(D I ^,r,(/)) = P(D | 0,T)7r(^)7r(T | Qtt{Q. In this 
case, there is one model for each number of possible taxa 
coexpanding, such that if there are 100 taxa, then there are 
101 models of ^fc. 
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Fig. 7. Posterior estimates of hyperparameter C, and parameter summaries Tj and Var(T)/£(T) conditional on data from the subset of 26 Australian bird 
populations that were identified by the PCA to have synchronously coexpanded. The ABC posterior distributions indicate that these 26 populations 
expanded synchronously (panels A and B) approximately 33,000 years ago (panel D) during Marine Isotope Stage 3. Panel C depicts the first two 
principal components calculated from the 16 summary statistics calculated from the 32 Australian avian populations. The tightest cluster of 26 species 
was subsequently tested as a putative subset of species that synchronously coexpanded. 



Using this hABC approach, we compress an observed 
multitaxon data set into a summary statistic vector (D*) 
and condition on D* to obtain Tt{t^r,(p \ D*), the approxi- 
mate joint posterior probabilities of the expansion times Tj. 
(summarized by posterior estimates of £(t), Var(T)/E(T)) 
and ^, the proportion of the taxon assemblage that ex- 
panded synchronously. In this case, the data consist of a 
sample of multiple alleles from a single locus from each of 
n taxa assumed to be panmictic. 

Multitaxa coalescent simulations were performed using a 
python script that combined single taxa simulations and 
summary statistics from Bayes Serial Simcoal (Anderson 
et al. 2005) into a single file and calculates the multispecies 
summary statistics. Python scripts for the simulations are 
available at https://github.com/UH-Bioinformatics/hBayeSSC 
(last accessed June 16, 2014). 

Summary Statistics 

Following our objectives of quantifying temporal patterns of 
coexpansion, we construct multispecies summary statistics 



that are based on four summary statistics that are 
known to be correlated with demographic expansion, 
including number of haplotypes, haplotypic diversity, 
nucleotide diversity, and Tajima's D. To ensure that the 
summary statistic vector D, used for the ABC procedure, is 
independent of the ordering of the data configuration, and to 
reduce the dimensionality in D, we use the first four sample 
moments of these four summary statistics calculated on 
each of the four summary statistics across the n taxa. 
Specifically, we use the mean, variance, skewness, 
and kurtosis of each of these four summary statistics 
thereby yielding a vector D consisting of 16 order-indepen- 
dent multitaxa summary statistics. To initially explore the 
statistical behavior of the 16 components of D*, we simulated 
data from 50 taxa and a prior approximately L/(1,000, 500,000) 
for T under models ^=0.0 and ^ = 1.0. These conditions 
demonstrated the summary statistics to have differ- 
ing distributions under these two models, a favorable condi- 
tion for summary statistic selection in ABC (supplementary 
figs. SI and S2, Supplementary Material online; Marin et al. 
2012). 
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hABC Estimation 

For obtaining the ABC joint posterior of and the summaries 
of the n expansion times (£(t), r^, Var(T)/£(T)), we use rejec- 
tion sampling to identify the 10,000 closest Euclidean dis- 
tances between D* and D, calculated from 2.0 x 10^ 
random draws from each of the models P(^(.) - U{i^o< ■ ■ • Ci) 
with equal prior probability. After a second step of rejection 
sampling to identify the 1,000 closest Euclidean distances be- 
tween D* and D„ we fit a local linear regression model to these 
posterior draws and applied this model to these remaining 
1,000 acceptances to obtain an adjusted estimate of the joint 
posterior probability of £(t), t^, Var(T)/£(T), and using 
functions available from the R package abc.r (Csillery et al. 
2012). The local linear adjustment leads to improved estima- 
tion over estimates obtained by simple rejection sampling 
(fig. 6A and B). For we then back-transform the adjusted 
estimates to the closest value contained in the discrete uni- 
form prior P(^(c) due to local linear regression leading to ad- 
justed values of ^ that are not contained in the prior. For 
example, if the prior contains 0, 1/50 - 0.02, 2/50 - 0.04, . . . , 
50/50 - 1, and the parameter value sampled in the posterior 
after the regression adjustment is 0.025, this value is set to 0.02 
(1/50). 

Simulation Cross-Validation of ABC Estimation 
The development of novel ABC methods require careful ex- 
amination of bias and accuracy, and especially in the case of 
using ABC for model estimation, these developments must go 
hand in hand with validating simulation experiments (Cook 
et al. 2006; Bertorelle et al. 2010; Cornuet et al. 2010; Csillery 
et al. 2010; Robert et al. 201 1). To this end, we first simulated 
10-species and 50-species models, while varying the priors for 
expansion times using a P(Tmin. i^max) of L/(1,000, 100,000) and 
L/(1,000, 500,000) generations. The sample configuration of 
the 10 and 50 species consisted of 40 sequences of 800 bp of 
single-locus mtDNA data and a fixed locus mutation rate of 
8 X 10^^ 

Initially, the prior distribution for ^, P(Q, only 
considered the two extreme values of ^ = 0.0 and ^ = 1.0, by 
simulating 500,000 prior draws from these two models. We 
then independently simulated 100 PODs (Bertorelle et al. 
2010) from each of these two models and then used the 
ABC procedure to yield the 1,000 closest Euclidean distances 
between each PODs' D* and D, calculated using the data 
generated from each of the 500,000 prior draws and used 
this to approximate the posterior probabilities of ^ = 0.0 
and ^ = 1.0 for each estimate. Additionally, we used the 
Bayes Factor P(^ =0.0 1 D*)/P(^ = 1.0 | D*) ^ P(C = 0.0)/ 
P(^ = 1.0) or P(^ = 1.0 I D*)/P(^ = 0.0 I D*) ^ P(C = 1.0)/ 
P(^=0.0) (Kass and Raftery 1995) and the associated 
Jeffrey's scale (Jeffreys 1961) to gauge one's support for 
either history (^ = 0.0 or ^ = 1.0). As an additional assessment, 
we estimated Var(r)/£(T) and £(t) from 100 PODs condi- 
tional on ^ = 0.0 and ^ = 1.0, and either of the two numbers 
of populations (10 and 50) and priors for P(t) = L/(1,000, 
100,000) and 17(1,000, 500,000). 



Detecting Poor Model Fit with Graphical Checks 
We used the samples from the prior to check that our model 
priors could generate summary statistics similar to those cal- 
culated from PODs generated from alternative models 
(Cornuet et al. 2010). We performed a PCA using the 
"prcomp" function in R (R Development Core Team 2008) 
on the 16 multitaxa summary statistics from each of 100 
PODs simulated from our hyperprior conditional on ^ = 0.4 
given a history of 15 coexpanding taxa at 1,000-100,000 gen- 
erations before present and 35 taxa expanding at indepen- 
dent times and 100 PODs generated from three alternative 
model priors: 1) constant — 15 taxa coexpanding at 1,000- 
100,000 generations before present and 35 taxa not expand- 
ing; 2) declining — 15 taxa coexpanding at 1,000-100,000 
generations before present and 35 taxa declining at 1,000- 
100,000 generations before present; and 3) two pulse — 35 
taxa coexpanding at 30,000 generations before present and 
1 5 taxa coexpanding at 90,000 generations before present. For 
the graphical checks, we specifically focused on PCI and PC2 
for comparison. 

Pleistocene Expansion Times in Australian Avian 
Populations 

We apply our method to a set of 32 avian populations dis- 
tributed from across the Australian continent, most of which 
have been previously published and analyzed (table 3). Each 
population consists of a monophyletic cluster within which 
the population can be considered to be demographically pan- 
mictic. Each population putatively expanded from a single 
refugium during the Pleistocene, and our method attempts 
to discern if any of them coexpanded temporally without 
making any spatial assumptions or inferences. Following the 
general hierarchical ABC procedure, we simulated from the 
discrete uniform prior of P(^fc) = L/(^o- - ■ - -Ci)' with 200,000 
simulations from each of the 33 coexpansion models, which 
ranged from 0 to 32 species coexpanding at time (total 6.6 
million simulations). Following this hABC structure, Tj and 
Ta-lty . . . ,r \ {i-()r^ |} are randomly drawn from the uni- 
form prior P(t) = U( 1,000, 200,000). Likewise, the effective 
size N, of each of the 32 contemporary populations is inde- 
pendently drawn from a species-specific uniform prior 
L/(Ne,_min, Ne,_max) that instantaneously expands from an ef- 
fective population Sj its size at time or time 
= {^1/ . • • I I }. Mutation rates were drawn from a 
uniform prior distribution of U{firn\n< Mmax) that were based 
on previously reported locus and taxon-specific rates (table 3; 
Nabholz et al. 2008). After 6.6 x 10^ simulated random draws 
from the prior, the ABC filter of 1,000 closest Euclidian dis- 
tances between observed D*, and the 6.6 x 10^ simulated 
values of Dj are retained. Subsequently, we fit a local linear 
regression model to these posterior draws and apply this 
model to obtain an adjusted estimate of the joint posterior 
probability of ^z^, E{r),rs, and Var(T)/£(r). To further ascertain 
our ability to estimate the number of coexpanding popula- 
tions, we estimated ^, £(t), r^, and Var(T)/£(T) from sets of 
100 PODs that were drawn fi-om ^=0.0, ^ = 0.25, ^=0.5, 
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^ = 0.75, and ^ = 1.0 and calculated the error as | (true 
value — estimated value) | . 

Model Checking of the Prior and Posterior Predictive 
Distributions 

As a goodness-of-fit check to ascertain whether the model 
and our chosen priors can produce the main features of the 
observed data and to check for prior sampling efficiency 
(Hickerson et al. 2014), we deployed a PCA on prior and 
posterior samples of the 16 summary statistics. Initially, we 
used the first two principal components of the summary 
statistics calculated from 1,000 random draws from the sim- 
ulated prior distribution and 1,000 samples from the hABC 
posterior to compare with these first two components from 
the 32 avian population samples (D*). Second, we sample 
from the posterior predictive distribution (Gelman et al. 
2004) by simulating 1,000 data sets using parameters from 
the 1,000 posterior samples. To this end, we used the same 16 
summary statistics, the mean, variance, skewness, and kurtosis 
of the number of haplotypes, haplotypic diversity, nucleotide 
diversity, and Tajima's D ("training" set) as well as an alterna- 
tive set ("testing" set) that included the mean, variance, skew- 
ness, and kurtosis of the number of segregating sites and Fu's F 
and again projected the first two principal components of the 
corresponding observed summary statistics in both cases. 

Selecting Taxa with Shared History 
After the posterior estimate of ^ was obtained, given the 
observed D* from 32 avian population samples, we used a 
PCA on the summary statistics to aide in selecting and test- 
ing which ( \r]^ I ) populations coexpanded synchronously. 
Specifically, we calculated the first two principal components 
of the summary statistics calculated from 32 avian population 
samples. After recalculating D* on a subset of populations 
that are closely clustered in the PCA, we then perform an 
hABC analysis to determine if this subset of populations plau- 
sibly coexpanded synchronously. 

Supplementary Material 

Supplementary figures SI and S2 are available at Molecular 
Biology and Evolution online (http://www.mbe.oxfordjournals. 
org/)- 
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